This folder contains the data and code necessary to replicate all results in "How Do Americans Explain Their
Party Identification and Out-partisan Animosity?" by Anthony Fowler, Gregory A. Huber, Rongbo Jin, and Lilla V. Orr.

All analyses were conducted using StataNow 19.5.

Please contact Anthony Fowler (anthony.fowler@uchicago.edu) with any questions.

Files in this repository
code.do: Stata code to execute all analyses
whypid_yougov.dta: YouGov survey data
whypid_prolific.dta: Prolific survey data on why people identify with their party
whyFT.dta: Prolific survey data on why people feel the way they do about the other party
HandCodedSample.dta: A sample of observations from the YouGov survey with automated and human codings
ExpertSurveyAnonymous.dta: Anonymized results from our expert survey

Codebook for whypid_yougov.dta
sample: string variable indicating the month of the survey
id: anonymized, unique identifier for each respondent
weight: survey weights
why: open ended responses to our question about why respondents identify with their party
pid7: 7-point party identification; 1 = strong Democrat, 2 = not strong Democrat, 3 = lean Democrat, 4 = pure independent, 5 = lean Republican, 6 = not strong Republican, 7 = strong Republican
policy/disdain/candidates/performance/socialization/groups: binary indicators for whether the open-ended response was classified as fitting each of our categories
woman/white/age40/income70/northeast/midwest/west: binary indicators for gender, race, age, income, and geography; age40 is 1 for respondents who are 40 and older, income70 is 1 for respondents with incomes of $70k or greater
extreme: indicator for whether respondent identifies as an extreme liberal or extreme conservative
think_vs_are: indicator for which version of the open-ended question the respondent received in the May 2025 survey (see paper for details)
when_asked_first: indicator for whether the "when" question was asked before the "why" question in the May 2025 survey

Codebook for whypid_prolific.dta
id: anonymized, unique identifier for each respondent
why: open ended responses to our question about why respondents identify with their party
policy/disdain/candidates/performance/socialization/groups/none: binary indicators for whether the open-ended response was classified as fitting each of our categories
nonetype: string variable corresponding with different kinds of answers that did not fit any of our categories
totpolicy/totdisdain/totcandidates/totperformance/totsocialization/totgroups:variables ranging from 0-4 indicating the number of coders who applied each category to each response
pid7: 7-point party identification; 1 = strong Democrat, 2 = not strong Democrat, 3 = lean Democrat, 4 = pure independent, 5 = lean Republican, 6 = not strong Republican, 7 = strong Republican

Codebook for whyFT.dta
id: anonymized, unique identifier for each respondent
PID7: 7-point party identification; 1 = strong Democrat, 2 = not strong Democrat, 3 = lean Democrat, 4 = pure independent, 5 = lean Republican, 6 = not strong Republican, 7 = strong Republican
tot_ variables: the number of coders (between 0 and 4) who applied each category to that response
pid_ft_why: open ended responses to our question about why respondents feel the way they do about the other party
pid_ft_value: thermometer rating of the other party
woman/white/age40/income70/northeast/midwest/west: binary indicators for gender, race, age, income, and geography; older is 1 for respondents who are 40 and older, highincome is 1 for respondents with incomes of $70k or greater

Codebook for HandCodedSample.dta
yougovid: identifier for each respondent
why: open ended response to why party question
All remaining variables indicate whether GPT and each of the four authors applied each category to that response.

Codebook for ExpertSurveyAnonymous.dta 
whypid_ variables: Expert predictions about the percent of public responses to party identification question that will apply to each category
whyft_ variables: Expert predictions about the percent of public responses to the partisan affect question that will apply to each category
whypidreally_ variables: Expert assessments of the percent of the public for whom each category is actually an important cause of party identification
whyftreally_ variables: Expert assessments of the percent of the public for whom each category is actually an important cause of their partisan affect



